Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Include vendored typeshed stubs as a zipfile in the Ruff binary #11779

Merged
merged 9 commits into from
Jun 7, 2024

Conversation

AlexWaygood
Copy link
Member

@AlexWaygood AlexWaygood commented Jun 6, 2024

Summary

Work towards #11653.

This PR adds a build.rs script to the red_knot crate that compresses our vendored typeshed stubs into a zip that is then included in the Ruff binary via include_bytes!() in module.rs. This will enable us to resolve modules to vendored stdlib stubs from typeshed in the future.

Details:

  • The build.rs script is run before anything else in the crate is built
  • We print a cargo:rerun-if-changed directive to stdout to let Cargo know that the build.rs only needs to be rerun if the contents of crates/red_knot/vendor/typeshed changes, or if the build.rs script itself changes. Docs here: https://doc.rust-lang.org/cargo/reference/build-scripts.html#rerun-if-changed. We need to use a syntax for this directive that's officially deprecated, due to the fact that our pinned minimum Rust version is old enough that it doesn't support the new version. (The deprecated syntax is cargo:rerun-if-changed=, with one colon. The new syntax uses cargo::rerun-if-changed=, with two colons.)
  • The zipfile that is created during the build step is .gitignored. Unfortunately there will be a need to manually keep that .gitignore entry in sync with the hardcoded path in build.rs.
  • The PR adds a new Ruff dependency on the zip crate. Following the precedent in Concerns with zip version 1.x uv#3642, I pinned to an old version of the zip crate, and added a Renovate rule to make sure that we don't get dependency-update PRs for that crate.

Test Plan

Some basic tests have been added that show that the zip archive containing typeshed is available from the Ruff after it has been built, and that the zip archive has some stdlib stubs in it.

@AlexWaygood AlexWaygood added the red-knot Multi-file analysis & type inference label Jun 6, 2024
@carljm
Copy link
Contributor

carljm commented Jun 6, 2024

Looks good! Did you (manually) verify that changing a file in the vendored typeshed directory triggers a rebuild of the zip and the changed contents show up in the binary? (Eg delete update_wrapper from the functools stub and see the added test fail.)

@AlexWaygood
Copy link
Member Author

Looks good! Did you (manually) verify that changing a file in the vendored typeshed directory triggers a rebuild of the zip and the changed contents show up in the binary? (Eg delete update_wrapper from the functools stub and see the added test fail.)

Tried that out, and it looks good!

image

Now to make it work on Windows...

Copy link
Contributor

github-actions bot commented Jun 6, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@carljm
Copy link
Contributor

carljm commented Jun 6, 2024

🎉

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. I've some questions before approving

  • What's the binary size increase?
  • What's the build time increase on an incremental (change to a rust file) and clean build. You can get the timings with cargo build --timings -p red_knot
  • Did you open a created archive. Does it look correct?

.github/renovate.json5 Show resolved Hide resolved
.gitignore Outdated Show resolved Hide resolved
Cargo.lock Show resolved Hide resolved
crates/red_knot/build.rs Outdated Show resolved Hide resolved
// Write file or directory explicitly
// Some unzip tools unzip files with directory paths correctly, some do not!
if path.is_file() {
println!("adding file {path:?} as {name:?} ...");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the println statements? Especially if they clutter the CLI output

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is a build script, these aren't actually printed to stdout during the build process unless the build actually fails -- it's more like logging, really. But if the build fails, then the entire captured stdout from the build is printed in a summary, e.g.

image

crates/red_knot/build.rs Outdated Show resolved Hide resolved
crates/red_knot/build.rs Outdated Show resolved Hide resolved
crates/red_knot/build.rs Outdated Show resolved Hide resolved
crates/red_knot/src/module.rs Outdated Show resolved Hide resolved
@AlexWaygood
Copy link
Member Author

AlexWaygood commented Jun 7, 2024

  • What's the binary size increase?

On main, after running cargo clean && cargo build --all-features:

  • The executable created for red_knot is 9.565 MB
  • The executable created for ruff is 77.481 MB

With this PR branch:

  • The executable created for red_knot is 9.565 MB
  • The executable created for ruff is 77.487 MB

The TYPESHED_ZIP_BYTES array in module.rs appears to be of length 617,342.

  • What's the build time increase on an incremental (change to a rust file) and clean build. You can get the timings with cargo build --timings -p red_knot

On main, cargo clean && cargo build --timings -p red_knot reports the total time as being 9.7s. With this PR branch, it increases to 10.5s. The time taken for an incremental build after a change to a Rust file (that isn't build.rs) seems noisy/unchanged -- it took 0.95s for me on main and 0.81s on this PR branch.

@AlexWaygood
Copy link
Member Author

Did you open a created archive. Does it look correct?

We're definitely able to inspect the contents of files in the zip at runtime using the zip crate that we used to create the zip. From what I can tell, everything is as we'd expect it to be. I assert as such in the tests I added, and the tests correctly fail if I change the contents of the vendored stdlib/functools.pyi file.

I also managed to inspect the filenames in the zip using Python's zipfile module, and everything was as I would expect it to be. (Except that on my machine, the zip includes a .DS_STORE file... but probably not a huge issue, and unlikely to appear in the binaries we publish to PyPI.)

I've been struggling to extract the zip archive or inspect the contents of any of the files in the zip from outside Rust -- most tools for unzipping zip archives don't seem to expect the zstd algorithm to be used, and I havne't succeeded in using zstandard tools to extract the zip either. I don't know if that's a concern or not.

@AlexWaygood AlexWaygood requested a review from MichaReiser June 7, 2024 12:17
@MichaReiser
Copy link
Member

MichaReiser commented Jun 7, 2024

What's the binary size increase?

Sorry, I should have been more explicit. Can you compare the release build numbers.

Also, The red knot binary size is identical. I think the typeshed files should be larger than 0 bytes? Ohhh, maybe the compiler optimizes them away (or isn't part of the executable anymore if you moved it to tests)

@AlexWaygood
Copy link
Member Author

Also, The red knot binary size is identical. I think the typeshed files should be larger than 0 bytes? Ohhh, maybe the compiler optimizes them away (or isn't part of the executable anymore if you moved it to tests)

Yeah, this confused me as well, since the length of the TYPESHED_ZIP_BYTES array is definitely not zero. I deliberately passed --tests so I think it should be included in the binary. Not sure.

Sorry, I should have been more explicit. Can you compare the release build numbers.

No worries, I'll do that!

@AlexWaygood
Copy link
Member Author

With cargo build --release --all-features --all-targets.

main:

  • red_knot binary: 3.221 MB
  • ruff binary: 25.195 MB

PR branch:

  • red_knot binary: 3.221 MB
  • ruff binary: 25.196 MB

So, still not showing up in the size of the binary (even though it's definitely compiling all the test features as part of that command.

Still, we know how large the zip is from this:

The TYPESHED_ZIP_BYTES array in module.rs appears to be of length 617,342.

@AlexWaygood
Copy link
Member Author

In c5ad832, I switched to setting the location of the zip (relative to cargo's OUT_DIR environment variable) via an environment variable set at compile time in .cargo/config.toml. This feels slightly nicer than having to hardcode the location manually in both crates/red_knot/build.rs and crates/red_knot/src/module.rs. @MichaReiser, are you okay with that change?

@MichaReiser
Copy link
Member

MichaReiser commented Jun 7, 2024

@AlexWaygood what's the benefit of the environment variable for the name of the zip? It feels like unnecessary complexity to me, considering that we don't have a use case for a different name.

If it's just to avoid the path repetition, then I think adding a comment to build.rs or the other way around would be sufficient.

Edit: Like I'm not against it, but it makes the code harder to follow. I then need to search for where the environment variable comes from and understand if there's actually a use case where we want to change the path. Having it hard coded makes it very explicit and failing to update the path in one file will also give you compile errors right away.

@AlexWaygood
Copy link
Member Author

Yes, it was just to avoid the repetition, and to make it so that there was a "single source of truth" for the location. It felt more "principled" to me. But I also wasn't sure if it was worth the new complexity, so I'll revert it if you find it less readable 👍

@MichaReiser
Copy link
Member

MichaReiser commented Jun 7, 2024

Yes, it was just to avoid the repetition, and to make it so that there was a "single source of truth" for the location. It felt more "principled" to me. But I also wasn't sure if it was worth the new complexity, so I'll revert it if you find it less readable

I think it's just slightly over-engineered ;)

@AlexWaygood
Copy link
Member Author

Thanks both for the reviews!

@AlexWaygood AlexWaygood enabled auto-merge (squash) June 7, 2024 14:58
@AlexWaygood AlexWaygood merged commit 37d8de3 into main Jun 7, 2024
17 checks passed
@AlexWaygood AlexWaygood deleted the zip-typeshed branch June 7, 2024 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants